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Comparison of physical models and artificial 
intelligence for prediction of flood levels 


Comparación de modelos físicos y de inteligencia 
artificial para predicción de niveles de inundación 


Abstract 


Hydrology has used traditional methods for flood level forecasting. 
However, this type of forecast can lead to accuracy issues, caused by 
the nonlinear behavior of floods and limitations by not including all 
variables, such as water flow, level and precipitation. Consequently, 
some scientists began to use unconventional methods based on artificial 
intelligence models, to forecast floods more precisely and rigorously. 
This paper compares the HEC-RAS one-dimensional flow transit model 
with an artificial intelligence model based on Artificial Neural Networks, 
developed in MatLab to predict floods. The results were analyzed using 
six statistical indicators: mean absolute error (MAE), mean squared 
error (MSE), mean absolute percentage error (MAPE), square root of the 
MSE, Pearson correlation coefficient (CC), and concordance correlation 
coefficient (pc). In addition, the efficiency coefficient was calculated, and 
used in a virtual tool called Hydrotest. The analysis shows that forecast 
models that use neural networks have accurate results, given their 
closeness to the real data: MAPE between 11.95 and 12.51, CC between 
0.90 and 0.92, pc between 0.84 and 0.87, and a coefficient of efficiency 
larger than 0.8. The study was conducted on a section of the upper 
Bogotá River, in Colombia, between the Florence Bridge and Tocancipá 
hydrological stations. Flow data was taken from the Regional 
Autonomous Corporation of Cundinamarca (CAR), from September 2009 
to October 2013. 
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Resumen 


La hidrología ha utilizado métodos tradicionales para pronosticar niveles 
de inundación. Sin embargo, éstos pueden generar problemas de 
precisión, causados por el comportamiento no lineal de las inundaciones 
y las limitaciones al no incluir todas las variables, como flujo, y nivel de 


agua y precipitación. En consecuencia, algunos científicos comenzaron a 
utilizar métodos no convencionales basados en modelos de inteligencia 
artificial, pronosticando las inundaciones de manera más precisa y 
rigurosa. Este artículo presenta una comparación de un modelo de 
tránsito de flujo unidimensional desarrollado en HEC-RAS y un modelo 
de inteligencia artificial, basado en redes neuronales artificiales, 
desarrollado en MatLab, para predecir inundaciones. El análisis de los 
resultados se llevó a cabo utilizando seis indicadores estadísticos: error 
absoluto medio (MAE, por su nombre en inglés); error cuadrático medio 
(MSE); error medio porcentual absoluto (MAPE, por su nombre en 
inglés); raíz cuadrada de la MSE; coeficiente de correlación de Pearson 
(CC, por su nombre en inglés), y coeficiente de correlación de 
concordancia (pc, por su nombre en inglés). Además, el coeficiente de 
eficiencia se calculó empleando una herramienta virtual llamada 
Hydrotest. A partir del análisis se observó en los modelos de pronóstico 
que el uso de redes neuronales tiene resultados precisos, dada su 
cercanía con los datos reales: MAPE, entre 11.95 y 12.51; CC, entre 
0.90 y 0.92; pc, entre 0.84 y 0.87, y finalmente un CE más grande que 
0.8. El estudio se realizó en una sección de las partes altas del río 
Bogotá, en Colombia, entre las estaciones hidrológicas de puente 
Florencia y Tocancipá. Los datos de flujo fueron tomados por la 
Corporación Autónoma Regional de Cundinamarca (CAR) de septiembre 
de 2009 a octubre de 2013. 
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Introduction 


Flooding is a natural phenomenon which occurs when rainfall occurs 
frequently or is so strong that the soil's absorption capacity is exceeded, 
causing water to change course and extend into adjacent areas (SDAB, 
2009). The consequences become more drastic when occurring in 
populated urban areas, since this involves not only environmental but 
also social and economic damage. Government entities must then divert 
large amounts of resources originally allocated for other sectors (/.e. 


education and health) to recover flooded spaces and infrastructure 
(CAR, 2016). World Bank data (Banco Mundial Colombia, 2012) has 
shown that flooding causes 43% of destroyed homes and about 10% of 
the loss of human lives. 


Two events having great climatic variability, which represent the 
greatest threat in Colombia, are the phenomena called “El Niño” and “La 
Niña.” The former is characterised by droughts and a lack of water, 
thereby producing forest fires, whilst “La Niña” involves greater soil 
saturation, which causes events such as landslides and flash floods, 
especially in Colombia's Andean, Caribbean and Pacific regions. A state 
of economic, social and ecological emergency was declared throughout 
Colombia in January 2011 due to the devastating effects of flooding. The 
Corporación Autónoma Regional (CAR), the entity that manages the 
River Bogota Basin, argues that probabilistic models are needed for 
estimating climatic variability and identifying any increase in the volume 
of rivers, since they can be used to generate natural disaster alerts and 
provide useful information for decisions regarding emergency prevention 
(CAR, 2016). 


Hydrology has  traditionally used one-dimensional methods for 
forecasting flows, with which flooding is determined by linear regression 
(Pandey 8 Nguyen, 1999). This method measures the relationship 
between the dependent and independent variables (Weisberg, 2005). 
Drawbacks from this approach include problems and limitations in terms 
of prediction, not just due to climate change affecting the earth 
(Huffman, 2001), or difficulties regarding calibration and the robust 
optimisation tools needed (Kia et al., 2011), but also because these 
types of phenomena are not linear, thereby making the use of this type 
of predictive model unsuitable (Dawson, Abrahart, Shamseldin, 8 Wilby, 
2006; Adil, Kita, Yano, €: Nishiyama, 2007). As stated above, while 
traditional methods have proved extremely helpful in predicting floods, 
researchers have now taken to studying more effective models having 
greater accuracy in forecasting. 


Another way of forecasting floods is to use physical models based on 
hydraulic principles. This makes it possible to explain river flow patterns 
through physical laws linked to differential equations. Saint-Venant 
equations have been studied; they have been useful in representing 
water flow models. However, it has been shown that they sometimes 
produce unstable solutions due to a great accumulation of errors when 
the flow depth increases rapidly, thereby requiring more complex 
mathematics and more precise modelling (Amarís, Guerrero, 8 Sanchez, 
2015). 


Another difficulty related to physical models is the amount of 
information they require in terms of hydro-meteorological variables 


(flow, water level, rainfall), in addition to the geological and topographic 
aspects of a particular channel (í.e. bathymetry (underwater terrain 
depths and shapes), and soil types, flow curves and runoff parameters 
(Merwade, Cook, 8 Coonrod, 2008; Kia et a/., 2011). The forgoing limits 
the use of this type of model, since certain basins have not been 
characterised in terms of their storage capacity, water catchment, and 
the likely flood zones along the rivers in question (Werner, Gallagher, € 
Weeks, 2006; Park, Joo, € Kim, 2012; Callow 8: Boggs, 2013). 


The physical model analysed in this research ¡is a transit time model for 
flow, which predicts changes in the magnitude, velocity and shape of a 
flow wave as a function of time (hydrograph) at one or more points 
along a river or canal (Chow, Maidment, 8 Mays, 1994). This one- 
dimensional modelling involved using the Hydrologic Engineering 
Center's (CEIWR-HEC) River Analysis System (HEC-RAS) software, 
created by the US Army Corps of Engineers (US Army Corps Engineers 8 
Hydrologic Engineering Center, 2016). This has been used in work 
involving hydraulic simulation (Manfreda et a/., 2014; Guida, Swanson, 
Remo 8 Kiss, 2015; Dimitriadis et a/., 2016). 


HEC-RAS software has also been used for analysing the risk of flooding 
with 3D, 2D and 1D hydraulic simulation systems (Zazo, Molina, € 
Rodríguez-González, 2015). 


Studies of prediction models have been developed for future events, 
integrating artificial intelligence system techniques, which have a 
flexible mathematical structure capable of modelling complex nonlinear 
relationships between input and output data characteristics; this is 
difficult to describe using physical equations (Seckin, Cobaner, Yurtal, 8 
Haktanir, 2013). Artificial neural networks (ANN) represent one of the 
most used techniques in the field of artificial intelligence for flood 
forecasting worldwide. They simulate the brain's functioning for 
resolving problems through  mathematical models inspired by 
neurological processes (Kalteh, 2013; Wang, Chau, Cheng, € Qiu, 
2009). Another technique involves linking ANN with an  adaptive 
network-based fuzzy inference system (ANFIS), which is used for 
building forecast models (Aqjil et a/., 2007). Table 1 gives some cases of 
ANN being used as a prediction system. 


Table 1. Examples of using artificial neural network (ANN) models. 


Kalteh (2013) Developing prediction models using artificial intelligence 
techniques 


Nastos, Paliatsos, 
Koukouletsos, Larissi, 
8: Moustris (2014) 


Tisseuil, Vrac, Lek 8 
Wade (2010) 


Yilmaz, Imteaz, 8 
Jenkins (2011) 


Taormina, Chau, 81 
Sivakumar (2015) 


Deo 8 Sahin (2016) 


Appelhans, 
Mwangomo, Hardy, 
Hemp, 8 Nauss 
(2015) 


Deo € Sahin (2015) 


Predicting daily maximum rainfall using multiple linear 
regression models and artificial neural networks 


Evaluating statistical (downscaling) models, such as ANN 
neural networks, for predicting climate change 
considering hydrological resources 


Predicting snow-related catchment flows, evaluating 
runoff data based on meteorological history 


Predicting a river's flow with base flow separation and 
binary-coded swarm optimisation 


Making an extreme learning machine (ELM) for simulating 
monthly mean flow levels in eastern Queensland, 
Australia, comparing the performance of flow prediction 
models with that of artificial neural networks 


Predicting temperature patterns on mount Kilimanjaro, 
using machine learning approaches involving 14 learning 
algorithms 


Predicting monthly standardised precipitation (SPI) and 
standardised precipitation evapotranspiration index 
(SPEI) 


Artificial intelligence techniques are currently being used as a reference 
for research dealing with predicting future events, because they emulate 
a particular pnenomenon's non-linear behaviour, thereby resulting in a 
more successful forecast (Zou, Xia, Yang, € Wang, 2007). Artificial 
intelligence techniques help in making appropriate decisions regarding 
water use, particularly in the field of hydrology. 


This article compares a physical model to an intelligent model for 
predicting flood levels along a stretch of the River Bogota basin 
(Colombia) between the Puente Florencia (satellite) and Tocancipá 
hydrological stations. 


Materials and methods 


The HEC-RAS hydrological model 


This tool enables hydraulic modelling of a water's permanent and 
temporary flow patterns in artificial canals and natural channels, 
including rivers (US Army Corps Engineers 8 Hydrologic Engineering 
Center, 2006). This software's hydraulic simulation is based on 
deterministic differential equations that enable the prediction of water 
level dynamics that occur during events with high rainfall that cause 
flooding. Flood levels are defined by cross-sectional profiles. The 
dynamics of the water and channel behaviour are simulated, including: 
cross-sections having any type of geometry along a channel, different 
depths of water and variable flow along a channel in sub-critical or 
super-critical flow conditions, having hydraulic effects due to natural or 
artificial transverse obstacles in the channel (Sarhadi, Soltani, € 
Modarres, 2012; Mohammadi, Nazariha, € Mehrdadi, 2014). 


HEC-RAS software (for the simulation model used in this research) uses 
a continuity equation (US Army Corps Engineers 8  Hydrologic 
Engineering Center, 2006) that describes the conservation of mass for a 
one-dimensional system, as well as calculates storage terms: 
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where x = distance along the channel, t = time, Q = flow, A = cross- 
section area, S = storage and q; = lateral input per unit of distance. 


The following calibration parameters were used in the modelling: 


Hydrographs: graphs enabling the flow rate or flow to be observed at a 
given point on the current (Chow et al., 1994). 


Flow curves or calibration curves: graphical representations of the 
relationship between the water level and its respective flow (Salazar 8 
Chaparron, 1990). 


Cross-sections: these define a rivers shape and  geometric 
characteristics and must be topographically connected so that they 
define the longitudinal profile. 
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Q =Ks? (2) 
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where K = cross-section, n = Manning coefficient of roughness for the 
section, A = the section's flow area and R = the section's hydraulic ratio 
(area/wetted perimeter). 


Manning's roughness coefficient: also called the  roughness 
coefficient, which enables a channel's runoff resistance to be estimated 
(Ruberto, Carreras, € Depettris, 2003). When there are several Manning 
coefficients (nc) for a channel's roughness. The main channel is divided 
into N parts, each having a wetted perimeter Pí and a roughness 
coefficient n;: 
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where n. = composite roughness coefficient; P = main channel wetted 
perimeter; P¡ = wetted perimeter of section 1, and n; = roughness 
coefficient per section. 


Research using this software usually evaluates how well a hydraulic 
model predicts floods, in order to identify vulnerable areas, critical 
infrastructure and the affected land use value (Sarhadi et al/., 2012; 
Zazo et al., 2015). Studies have demonstrated that HEC-RAS modelling 
enables different scenarios to be evaluated for forecasting areas of 
flooding (Guida et al., 2015). This software is also used for optimising 
the geometric visualization of areas prone to flooding, which can be 
subsequently visualised using a geographic information system (GIS) 
(Sarhadi et al., 2012). Mohammadi et al. (2014) simulated flooding and 
hydraulic conditions in flood areas for different return periods 
(recurrence intervals) using HEC-RAS, HEC-GEORAS and GIS models in 
a Case study, presenting their results as risk analysis and flood damage. 


Artificial neural network (ANN) model 


A standard ANN structure (Figure 1) consists of a set of neurons 
organised into a hierarchy of layers (input, hidden and output) 
constituting an autonomous functional system (Chen, Chen, Chou, € 


Yang, 2010). The following elements can be identified with this type of 
intelligent system: input and output variables and synaptic weights (the 
intensity of interaction between neurons and propagation, activation and 
output functions) (Komatsu et a/., 2014). The amount of layers and 
neurons represents one of the most important parameters in ANN since 
that determines a system's efficiency. 
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Figure 1. Backpropagation in an artificial neural network (ANN) 
structure (Chen et a/., 2010). 


One of ANNs” advantages is that they are useful tools for modelling 
when the ratio of entry data to output data is unknown (which is why 
this type of model is called a black box) (Chau, Wu, €: Li, 2005; Wang, 
Wang, Lei, Jiang, 4 Song, 2011), enabling complex systems to be 
modelled based on their mathematical composition (í.e. hydrological 
processes) (Dawson et al., 2006). Another benefit is that ANNs can 
produce output from a specific combination of inputs, and their response 
capability concerning managing non-linear data (Santillán, Fraile- 
Ardanuy, 8 Toledo, 2014;  Cervantes-Osornio, Arteaga-Ramírez, 
Vázquez-Peña, Ojeda-Bustamante, 8 Quevedo-Nolasco, 2013). 


Available data 


The Bogota River basin ¡is located in Colombia's Cundinamarca 
department. It has a 5 891 km? surface area, representing around 32% 
of the department's total surface area. The Bogota River is the basin's 
main river; it runs for a total of 308 kilometres from an altitude of 3 300 
metres above sea level (masl) in the municipality of Villapinzón to its 
outlet into the Magdalena River at 280 masl in the municipality of 
Girardot (CAR, 2006). 


The Bogota River is divided into 3 sub-basins: upper, middle and lower. 
The stretch being studied was in the upper basin (Figure 2) between the 
Puente Florencia hydrological station in the municipality of Gachancipá 
(upstream) and the Tocancipá station in the municipality of Tocancipá 
(downstream). This stretch is characterised by having hourly flow 
frequency records available and stability regarding such records (/.e. 
there is no reservoir or other large water body nearby that significantly 
alters the basin's hydrological pattern). 


Figure 2. The study segment (authors, with cartography supplied by 

the CAR and IGAC). Regional Autonomous Corporation (Corporación 

Autónoma Regional - CAR), Agustín Codazzi Geographical Institute 
(Instituto Geográfico Agustín Codazzi, IGAC). 


Twelve small sub-basins were identified as contributing flows to the 
section being studied, which added together, considerably increase the 
river's level during periods of heavy rainfall. The rational formula 
method (estimating peak runoff rate at a specific location as a function 
of drainage network area, runoff coefficient, and mean rainfall intensity) 
was Used for calculating each sub-basin's contribution to flow, using the 
water level curves corresponding to each sub-basin and flow order 
(Horton, 1945). 


Physical model produced using HEC-RAS software 


The physical model implemented in HEC-RAS required establishing the 
calibration parameters, with which the flows were simulated at the 
output of the model. The parameters used for simulating the model's 
output flows were as follows: hydrographs from the Puente Florencia 
and Tocancipá stations, flow curves or calibration curves, cross- sections 
and Manning roughness coefficient. 


Data from April, May, October and November 2011 and 2013 were taken 
for simulating the (branch-network) flow model as flooding occurred 
during these dates due to heavy rainfall. 


Concerning the flow curves, upstream at the Puente Florencia station, a 
maximum of roughly 60 m3/s can occur at a 5-meter water level height. 
And downstream at the Tocancipá hydrological station a maximum of 
roughly 50 m/s at a 5-meter water level height can occur. 


Regarding the model's cross-sections, bathymetry was used for taking 
measurements at points in the field; 151 sections from the study 
segment were used, with distances varying from 100 to 800 meters in 
length, depending on the shape of the channel (í.e. measurements were 
made at shorter distances in areas having very tight curves). 


Manning's roughness coefficient was calibrated based on constant friction 
with the surface and the surface with the least friction on the sides of the 
channel. 


Calibrating the one-dimensional hydraulic model simulated in HEC-RAS 
began by identifying a simple hydrograph showing a wave without 
distortions, during a period of time in which average flows occurred in 
the study segment, using Manning coefficients ranging from 0.021 - 
0.04 for all cross-sections (Santos, Cubillos, € Vargas, 2008; Cook 8 
Merwade, 2009). With the aforementioned characteristics in mind, the 


period from the 12t'f to the 23" of July 2010 was chosen, based on 
which the Puente Florencia station's hydrographs and the calculated 
flows from the 12 sub-basins were entered in the (HEC-RAS) database. 


Three scenarios (January, April to June, and October) were simulated for 
2011 and 2013 after calibrating Manning coefficients. The parameter 
concerning the last cross-section (í.e. the Tocancipá station or the 
model's output) was configured with a normal 0.0001 depth value, this 
being suitable for situations where flow approaches uniform rate (US 
Army Corps Engineers € Hydrologic Engineering Center, 2006). 


Artificial neural network (ANN) model 


MATLAB Neural Network Toolbox (2013) software was used for ANN 
simulation, with flow data from the Puente Florencia station and from 
the 12 sub-basins along the study segment as input, and flows at the 
Tocancipá station as the model's output. 


The data was normalised between -1 to 1 (Matworks, 2013), giving an 
input entry for the model using the Puente Florencia station's flows and 
those from the 12 sub-basins along the study segment. The model's 
output was a vector from the ANN flow data calculated for the Tocancipá 
station. 


Accurately training the ANN and its forecasts involved dividing the data 
into two parts. Data from September 2009 to December 2012, including 
February, March, July, August and September 2013, were used for 
training (70%), whereas data (the remaining 30%) from January, April, 
May, June and October 2011 and 2013 were used for forecasting. 


MATLAB'S — Neural Network Toolbox was configured using 
backpropagation training (Kia et al., 2011; Chen et al., 2010). The 
Levenberg-Marquardt backpropagation algorithm was used for the 
learning function (trainlm), this being the fastest algorithm for this type 
of training with large networks. It adjustment function performs better 
for recognising a target system's patterns (Matworks, 2013). The 
following parameters were configured in the Toolbox in order to run the 
model: a maximum 2 000 i¡terations (repetitions), 1e-05 minimum 
gradient and a maximum of 6 validation reviews for evaluating the 
model's quality. 


A multilayer structure was used for training every scenario (Kia et al., 
2011; Siou, Johannet, Borrell, 8 Pistre, 2011), modifying the amount of 


layers (2 to 20) and neurons (2 to 50). Altogether, 168 scenarios were 
trained as input layers and divided into hidden layers according to their 
propagation function: 85 had a sigmoid-sigmoid configuration and 83 a 
sigmoid-linear configuration. A forecast was simulated for every scenario 
and a MATLAB programme used the results for ascertaining the model's 
efficiency. 


The models' statistical evaluation criteria 


After the simulations had been made, resulting in the Tocancipá 
station's output hydrographs for each period, these were compared to 
real data for the same periods of time. The following six statistical 
indexes were used for data analysis, which have been used in most 
articles consulted and as a method to evaluate the performance of 
simulation models) (Dawson, Abrahart, 8 See, 2007): mean absolute 
error (MAE) (Singhal 8 Swarup, 2011), mean squared error (MSE) 
(Gomes 8€ Ludermir, 2013), mean absolute percentage error (MAPE) 
(Lewis, 1982), root-mean-squared error (RMSE) (Singhal € Swarup, 
2011), Pearson's correlation coefficient (CC) (Lin, Hedayat, Bikas, 8 
Yang, 2002), and concordance correlation coefficient (pc) (Lin, 2011). 


The results of this research were also compared to HydroTest Statistical 
Assessment of Hydrological Forecasts, which evaluated 20 statistical 
measures reported by hydrological modelling studies (Dawson et al., 
2007). Four HydroTest metrics were used for evaluating real data and 
modelled data (HEC-RAS, ANN sig-lin and ANN sig-sig) (/.e. 30% of the 
data selected for validation). 


Results 


Table 2 shows the results of the HEC-RAS simulated model, with six 
statistics. As can be seen, a pc of 0.86 was obtained, indicating that the 
model had a high ratio of real to simulated data, in terms of accuracy 
and precision. The correlation coefficient (CC) indicated less than 10% 
error regarding simulated data ratio (í.e. the error was low). MAE, MSE 
and RMSE values were also low, indicating little differences regarding 


real data; this shows a good forecast since the MAPE value was 11%- 
20% (Lewis, 1982). 


Table 2. Statistical comparison of real data to HEC-RASsimulated data. 


Statistical 
method Po cc MAE MAPE MSE RMSE 
Value 0.8601 | 0.9077 | 2.2311 | 11.9535 | 15.7725 | 3.9715 


The three scenarios having the highest ANN MAE, MAPE, MSE, RMSE, CC 
and pc values were then chosen for each network configuration. Table 3 
shows the best three scenarios for each configuration obtained with the 


ANN model. 
Table 3. The best ANN scenarios. 
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Criteria Pp. CC MAE MAPE MSE RMSE 
Scenario 3 | 0.8639 | 0.9032 | 2.0652 | 13.4254 | 12.2868 | 3.5052 
rol. Scenario 4 | 0.8667 | 0.9035 | 2.0604 | 13.4697 | 12.2299 | 3.4971 
Scenario 9 | 0.8770 | 0.9215 | 1.9007 | 11.9590 | 10.1782 | 3.1903 
Scenario 2 | 0.8729 | 0.9136 | 1.9462 | 12.5194 | 10.9753 | 3.3129 
Ó Scenario 3 | 0.8593 | 0.9108 | 2.0435 | 12.6997 | 11.6512 | 3.4134 
Scenario 6 | 0.8731 | 0.9104 | 1.9834 | 13.0086 | 11.3318 | 3.3663 
Discussion 


After analysing all the statistical criteria regarding the best scenarios, it 
was determined that scenario 9 (consisting of 20 layers having 25 


neurons 


in each 


layer) had the best sigmoid-sigmoid propagation 


function, given that its results met the greatest amount of statistical 
criteria: least MAE (1.90), least MAPE (11.9%), least MSE (10.2), least 
RMSE (3.2), highest CC (0.92) and highest pc (0.88). Taking the MAPE 


result as a reference, the forecast was found to be good, ranging from 
11% to 20% (Lewis, 1982), and the CC indicated that the model had 
92% forecast accuracy in terms of the real data to simulated data ratio. 


Regarding the sigmoid-linear propagation function, scenario 2 
(consisting of 2 layers having 50 neurons) was chosen as the best 
forecast because it had the greatest amount of favourable results 
regarding the statistical criteria evaluated: least MAE (1.94), least MAPE 
(12.5%), least MSE (10.97), least RMSE (3.1) and highest CC (0.914) 
0.914. 


Regarding the amounts of neurons, both configurations resulted in the 
best forecast, having a considerable amount of them in each layer. 


Comparing the physical model to artificial intelligence 
models 


A literature search revealed investigations that compared mathematical 
models (such as linear regression or multiple regression) to intelligent 
artificial systems, concluding that intelligent systems had a higher real 
data to simulated data ratio (Aqgil et a/., 2007; Firat 8 Gúngór, 2007; 
Kisi, Shiri, 8 Nikoofar, 2012; Karimi, Kisi, Shiri, 8 Makarynskyy, 2013), 
thereby providing a better forecast than mathematical models. However, 
the search did not find evidence of hydraulics or traditional hydrology 
studies that made a comparison with a physical model, which is why this 
comparison was made. It was found that traditional statistical criteria 
(CC, MAE, MAPE, RSME) were used in such research but none involved 
analysis using the concordance correlation coefficient (pc), which 
indicates the relationship between a model's precision and its accuracy 
(Firat € Gúrgór, 2007). 

The results were used for comparing the HEC-RAS simulated physical 
model to the best two ANN MATLAB models. Table 4 gives the results for 
the three best models. The data suggests that the models had very 
similar forecasts; the sigmoid-sigmoid ANN, the sigmoid-linear ANN and 
HEC-RAS models are shown in order of effectiveness. 


Table 4. Results regarding the best HEC-RAS and ANN models. 
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Model Pc cc MAE MAPE MSE RMSE 
HEC-RAS 0.8601 0.9077 2.2311 11.9535 15.7725 3.9715 
Sig-sig ANN 0.877 0.9215 1.9007 11.959 10.1782 3.1903 
Sig-lin ANN 0.8729 0.9136 1.9462 12.5194 10.9753 3.3129 


Figure 3 compares the best three models used in the research (HEC- 
RAS, sigmoid-sigmoid ANN and sigmoid-linear ANN) and the real data to 
simulated data ratio using a reference line. It should be noted that 30% 
of the total flow data was used to validate the models in this simulation. 


a) Real data compared to HEC-Ras simulated data 
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b) Real data compared to ANN (sig-sig) simulated data 
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C) Real data compared to ANN (sig-lin) simulated data 
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Figure 3. Real to simulated data ratio for each model. 


For the Tocancipá station, real output flow data compared to simulated 
data was found above and below the reference line in this figure. If data 
were above it then the flow forecast would be overestimated and if the 
data were below the flow forecast it would have been underestimated. 


It can be seen that the HEC-RAS model shown in Figure 3a 
underestimated output flow, since most of the data were below the 
reference line, meaning that it could not predict flooding levels. Whereas 
data predicted by the ANN models (sig-sig Figure 3b and sig-lin 3c) had 
a more homogeneous dispersion, with data above and below the 
reference line, indicating that it would have a greater possibility of 
predicting flooding levels corresponding to high flows at the model's 
output. 


Table 4 shows that the sigmoid-sigmoid configuration resulted in the 
best ANN model. Hydrographs were then drawn after selecting the best 
intelligent model in order to compare real flow data to simulated flow 
data for a period of heavy rainfall, as in April 2011. 


As can be seen, the simulated values in the HEC-RAS model's 
hydrograph shown in Figure 4a were found to be lower than the real 
ones, meaning that the physical model did not properly predict the real 
flows that occurred during that period. The model would thus not be 
reliable for predicting future flooding events. 


Figure 4b shows that the selected ANN model better predicted real flows 
during the same period as that shown in Figure 4a; however, it 
fluctuated around the real data (as seen in the scatter plot / hydrograph 
in Figure 3b). 


a) Real data hydrograph compared to HEC-RAS hydrograph 


Comparing hydrographs of real data to Hec-Ras simulated data 
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b) Real data hydrograph compared to ANN hydrograph 


Comparing hydrographs of real data to (sig-sig) ANN 
simulated data for April 2011 
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Figure 4. Comparing real hydrographs to simulated hydrographs from 
the best two models. 


Validating the model 


Table 5 compares the statistics calculated with MATLABO to those 
calculated with HydroTest, along with three additional statistical criteria: 
the R-squared coefficient of determination (RSQR) (Pearson, 1896), 
Willmott's index of agreement (loAd) and the coefficient of efficiency 
(CE) (Ablan, Marquez, Rivas, Molina, 8 Querales, 2011). 


Table 5. Comparing modelling statistics to those produced by 


HydroTest. 
Model cc MAE RMSE RSQR | IoAd CE 
< + < + < + + + + 
< : < : < . , 
Mm IT mm I mn I I I I 
2.23 | 2.158 | 3.971 | 3.777 | 0.823 | 0.930 | 0.783 
HEC-RAS | 0.9077 | 0.9076 11 5 5 7 7 9 > 
ANN 
dl 0.9215 | 0.9204 1.90 | 1.893 | 3.190 | 3.210 | 0.847 | 0.955 | 0.843 
sig-sig 07 8 3 6 2 7 7 


The CC, MAE and RMSE values were very close (varying by tenths or 
hundredths), indicating that they were correctly found in the analysis 
performed by this research. The results of RSQR ranged from 0.801 to 
0.847, indicating satisfactory models, since this was close to 1.0 
(Pearson, 1896); it should be noted that the ANN model's sigmoid- 
sigmoid configuration was very close to being a good forecast model. 
Willmott's index of agreement (loAd) showed that the results for the 
best two simulated models were good, given values over 0.9, and 
generally very similar, with a range of 0.93 to 0.95. The coefficient of 
efficiency (CE) revealed a large difference between both prediction 
models, rejecting the HEC-RAS physical model (0.7837), which resulted 
in a value less than 0.8. The other two artificial intelligence models were 
found to be satisfactory, with values ranging from 0.80 to 0.84 (the 
latter corresponding to the sigmoid-sigmoid ANN model) (Dawson et al., 
2007). 


Conclusions 


After observing both models” performances, it was determined that the 
physical model underestimated a predicted flow's high values while the 
ANN-based model estimated real values more accurately. However, 
when reviewing the scatter graphs, more variation was observed with 
the ANN than with the HEC-RAS model, although the dispersion in the 
intelligent model was closer to the reference line, which could be seen in 
the hydrographs where variation was found even though the values of 
the simulated flows were close to the real flows. 


Good results were observed regarding the models” statistical values, 
demonstrating forecasts very close to those of real data and highlighting 
the techniques” effectiveness (11.95 to 12.51 MAPE, indicating a good 
forecast, 0.90 to 0.92 CC, indicating a good ratio between real and 
simulated data, 0.84 to 0.87 CCC signifying precision and accuracy 
regarding a forecast and RSqr, loAd and CE < 0.8, indicating satisfactory 
prediction). These results can be considered good because both models 
has low dispersion for middle and low flows, and that represented the 
largest amount of data used for the present research. 
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